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Abstract 



We have found the most general extension of the celebrated Sauer, Perles and Shelah, Vapnik and 
Chervonenkis result from 0-1 sequences to k-avy codes still giving a polynomial bound. 

Let C C {0, 1, . . . , k — 1}™ be a /c-ary code of length n. For a subset of coordinates S C {1,2, ... ,n} 
the projection of C to S is denoted by C\s- We say that C (i, j)- shatters S if C\s contains all the 
2' 5 ' distinct vectors (codewords) with coordinates i and j. Suppose that C does not (i, j')-shatter 
any coordinate set of size Sij > 1 for every 1 < i < j < q and let p = J2( s i,j ~ x )- Using a natural 
induction we prove that 

\C\ < 0(n p ) 

for any given p as n — > oo and give a construction showing that this exponent is the best possible. 
Several open problems are mentioned. 

Keywords: shattering, VC-dimension, forbidden configurations 



1 Introduction 



Let [n] denote the set {1,2,..., n} while let (k) denote {0, 1, . . . , k — 1} and for any set S, let 2 s 
denote the family of all 2'' s ' subsets of S and let (^) denote all ('£') subsets of S of size k. Consider 
a family T of subsets of [n] . We say that T shatters S if 

{EnS : E G J 7 } = 2 s . 

The following result has a variety of applications including learning theory and applied probability. 

Theorem 1 /Sauer[T2]. Perles, Shelah[T3]. Vapnik, ChervonenkisflSJ/ Let J 7 be a family of subsets 
of [n] with no shattered set of size s. Then 

and this bound is the best possible. 

Karpovsky and Milman [10] and independently Steele p3] gave a multivalued generalization of the 
result above. Let C C [k) n be a set of codewords (vectors). A codeword c can also be viewed as a 
function from [n] to (k). The code C is said to shatter S C [n] if 

{c\ s :cGC} = (k) s , 

the set of all functions from S to (k). 

Theorem 2 /Karpovsky and Milman [10] and independently Steele [2] (see also Frankl [5], Alon 
[1], Anstee [2])/ Let 1 < s < n be an integer and let C C (&;)" 6e a set of codewords with no shattered 
set of size s. Then 

| C |<^_i r -^ n Y (2) 

An important difference between the bounds is that ([T]) is polynomial in n (for fixed s), but ^ 
is exponential. The same phenomenon happens when uniform set systems are considered. The 
uniform version of Theorem [T] was proven by Frankl and Pach [6] (for a strengthening and algebraic 
connections see Anstee et.al. [1]). 
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Theorem 3 /Prankl and Pach [6j/ Let n,d,s be positive integers such that d < n and s < n/2. Let 
J~ Q ( j ) be a d-uniform set system that does not shatter an s-element set, the 



len 



Recently, Hegedus and Ronyai [9j gave two multivalued generalizations. 

Theorem 4 /Hegedus and Ronyai [9]/ Let < d < (k — l)n and s — 1 < n/2. Let C C (k) n be a 
code with no shattered set of size s and suppose that Yli=i °i = d for every c G C. Then 

8-1 / / \ / 

n 



i - 1 



Note that this bound is exponential in n. 

Theorem 5 /Hegedus and Ronyai [9j/ Let < d < n and < d + s < n + 1. Let C C (A;) n 6e a code 
with no shattered set of size s and suppose that \{i G [n] : Cj 7^ 0}| = d for every c E C. TTien 



One cannot expect an exponential bound here since the total number of codewords with support of 
size d is polynomial. 

A code C and the corresponding matrix M formed by the codewords are called reverse-free if Ai 

( 



a b 

does not have a submatrix of the form 

6 a 

code C C (k) n can be? It was proved in [7| that 



for any distinct a and b. How large a reverse-free 



max 



C\ = (n©) . (3) 



This can lead to the following version of multivalued shattering. Let C C [k) n be a set of 
codewords. C (i, j)-shatters S C [n] if C|s contains all 2l s l functions from S to Let > 2 be a 

fixed integer, s = (so,i, so,2, • • • s&_2,fc-i) be a positive integer vector of length ( 2 ) whose entries are 
indexed by ordered pairs with 0<i<j<k — 1. 

The main result of the present paper is the following theorem. 



Theorem 6 Suppose that C C (k) n does not (i, j)- shatter any coordinate set of size > 1 for 
every 0<i<j<k — 1. Then 



\C\< Y ( n ^ )=0(nP), (4) 

\ao,i,ao,2,---,afc-2,fc-i,"-- 2^o<i<j<fc-i a M/ 



where the sum is taken for all possible choices o/ajj 's and p = Ylo<i<j<k-l( s i,j ~~ -0- 

On the other hand, when p is fixed and n — > oo then there exist codes C C (k) n such that they do 
not (i,j)-shatter any coordinate set of size stj > 1 for every 0<i<j<k — 1 and 

|C|=n(nP). (5) 

In other words, if forb(n, s) denotes the maximum number of codewords of a code C of length n over 
the alphabet (k) such that C does not (£, j)-shatter any coordinate set of size Sjj then 

forb(n,s) = 9(n p ). 



2 A hierarchy of Vapnik-Chervonenkis type dimensions 

The VC- dimension of a set system C 2^ is the maximum d that J- shatters a set of size d. Theo- 
rem [T] bounds the size of a set system whose VC-dimension is less than s. Vapnik and Chervonenkis 
used it for bounds on the sample size necessary to obtain uniformly good empirical estimates for the 
expectations of all random variables of a given class. Since then it has found applications in learning 
theory, such as concepts with bounded VC-dimensions are effectively learnable. 

Theorem [2] allows the definition of another dimension, KM- dimension of codes (systems of mul- 
tisets) as follows. The KM-dimension of C C (k) n is the maximum d that C shatters a set of size d. 
Theorem [2] gives a bound on the size of a code of KM-dimension less than s. However, this bound is 
exponential function of n. 

Haussler and Long [8j introduced other generalizations of VC-dimension, motivated by statistical 
applications. The G-dimension of C C (k) n is the maximum d that there exists a vector y = 
(yi, 2/2, • • • Vd) 6 (k) d and a subset D = ■ ■ ■ id} Q [n] such that for all subsets I C D there 

exists c = (ci, C2, . . . c ra ) G C such that c^. = yj for ij £ / and c« t 7^ yt for i t I. 
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The P-dimension of C is the maximum d that there exists a vector y and a subset \D\ = d of [n] 
such that for all subsets I C D there exists c G C such that c^. > y^- for ij G I and Cj t < y% for ^ ^- 

The GP -dimension of C is the maximum d that there exists a vector y and a subset |D| = d of 
[n] such that for all subsets I Q D there exists c G C such that Cj. = for ij G I and Cj t < yt for 

J. 

Finally, the N -dimension (or Natarajan-dimension of C is the maximum d that there exist 
vectors y and with < y^ : i = 1, 2, . . . d and a subset |D| = d of [n] such that for all subsets / C D 
there exists c E C such that c^. = j/j for ij G I and Cj t = for i t g" /. 

It is easy to see that each of the above dimensions coincide with the VC-dimension in the case 
of k = 1. We also have 



dimxM(C) < diniN(C) < dimcp(C) < < 



dim G (C) 
dimp(C) 



(6) 



The concept of (i, j)-shattering allows us to define a new dimension which is between KM-dimension 
and N-dimension. 

The bi-dimension of C C (k) n is the maximum d that there exist i < j G (k) and and a set 
D C [n] of size d that C (i, j)-shatters D. If a set D is KM-shattered by C, then C|d is the set 
of all functions from D to (k), in particular it contains all functions from D to {i,j} for any pair 
i < j G (A;), so D is (z, j)-shattered by C. This shows 

dimKM(C) < dim bi (C). 

On the other hand, if D is (i, j)-shattered by C, then D satisfies the condition of N-dimension with 
vectors z = and y = (j, j, . . . , j), so the N-dimension of C is at least as large as its 

bi-dimension. 

Let A4x(n, s) denote the maximum size of a code of length n and X-dimension not exceeding s 
(X G {KM, bi, N, GP, G, P}). Then ^ and the observations above imply 



M G (n,s) 
Mp(n, s) 



< M GP (n, s) < M N (n, s) < M hi (n, s) < M K M(n, s). 
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In fact, Haussler and Long |8j proved that 

M G (n,s) =M P (n,s) =M GP (n,s) = £ ( n )(k-l)\ 

0<i<s ^ ' 

m nM < y: 

0<i<s 

These bounds are polynomial in n. Theorem [6] implies that A4bi(n,s) is polynomial, as well, since 
•Mu{n,s) = forb(n, s) for the vector s whose coordinates are all s + 1. However, M.KM.(n,s) is 
exponential according to Theorem [2] An extremal property of bi-dimension is that it is the weakest 
restriction that still results in polynomial bound. Indeed, if there is a pair of symbols i,j such that 
there is no restriction involving only that pair, then one can select all codewords C = {i,j} n so that 
C does not violate any restrictions yet it is of exponential size. 

3 Proofs 

In this section we give two versions of the proof of the upper bound in Theorem [6| The lower bound 
([5]) follows from Proposition [7J 

Branching proof. Let C C (k) n be a code avoiding an (i, j)-shattered set of size Sjj for all 
0<i<j<k — 1. The following branching process will be applied to C successively n times. 

Let B be a set of codewords of length t > 1 over alphabet (k). Let Bo denote the set of suffices 
of length t — 1 of codewords in B. Note, that if t = 1, then B$ has one element, the empty string. If 
a codeword b £ Bq appears with more than one first coordinate in B, say with i\ < 12 < ■ ■ ■ < i w , 
then b will be put into the (w — 1) sets B^^, Bi lt i 3 , . . . , Bi lt i w . We get 

\B\ = \B \+ Yl \ B v\' 

0<i<j<k-l 

Bij is said to be obtained by (i, j)-branching at step t from B. 

Thus, the process starts with B = C and t = n, and continues with t = n — 1, n — 2, . . . , 1. At 
step t every set of codewords obtained at step t + 1 is branched. At the end, there are \C\ singleton 
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Figure 1: Branching example 

sets each containing the empty string. For an example see Figure [T] Every singleton set is a result 
of a series of branchings, say ctij (i, ^-branchings for 0<i<j<k — 1. If «j j > Sj j for some pair 
i,j, and these branchings occur at steps t\, t2, ■ ■ ■ t ai . ■, then C (£, j)-shatters the set {t±, t%, . . . t ai , } 
that contradicts the assumptions. The maximum possible number of singleton sets with on a 
branchings is equal to the number of n-permutations of aij objects of type for < i < j < k — 1 
and n — Oij objects of "no branching" type, which is exactly the multinomial coefficient 

0<i<j<k-l 

(n 
"0,1, «0,2, • • • , «fc-2,fc-l, n - J2o<i<j<k~l a i,jJ 

This provides the upper bound @. □ 

Induction proof. Let C\ ^ C C consist of those codewords c that c n = i and there exists a 
codeword c' £ C that only differs from c in the last coordinate and c' n = j. C^jQC is defined 
similarly. If Sij = 1, then both and C\j are empty. Otherwise, let Sjj be the vector obtained 
from s by decreasing the (i,j) th coordinate by one. Then obviously \C\^\ = -| < forb(n — l,Sij). 
Let C|r n _i] = {c| [„_!]: c G C} be the set of length n — 1 prefixes of codewords in C. Clearly, 
C|[ n -i] — forb(n — 1, s). On the other hand, 

ici<ici m i+ Yl \ C U ( 7 ) 

0<i<j<fc-l 

In order to prove Q using induction we have to give upper bound for forb(l, s). In this case i and j 
both can be codewords in C iff Sjj > 1. Let Gg = ((k),E) be the graph on vertex set (k) be defined 
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by {i, j} G E <S=> Sij > 1. Then 

forb(l,s) =u(Gg). (8) 

It is an easy exercise that the right hand side of Q is an upper bound for this clique number in case 
of n = 1. The bound in Q follows from and ([8]) using induction and the well-known recurrence 
for the multinomial coefficients. □ 



4 Forbidden configurations 

Another generalization or sharpening of Theorem [l] considers forbidden configurations. We say a 
(0,l)-matrix is simple if there are no repeated rows. Given a (0,l)-matrix F, we say a matrix A 
has F as a configuration denoted F G A, if there is a submatrix of j4 which is a row and column 
permutation of F. Let \A\ denote the number of rows of matrix A. We define 

forb(ra, F) = max{|A| : A is a simple 0-1 matrix without configuration F of n columns}. (9) 

A simple (0,l)-matrix A naturally corresponds to a set system Fa taking the rows as characteristic 
vectors of subsets of [n] . Fa shatters an s-set iff A has the 2 s x s configuration of all distinct rows 
of size s. 

The concept of forbidden configurations can be extended for matrices of entries from (k). A 
(fc)-matrix is simple if there are no repeated rows. Given a (fc)-matrix F, we say a matrix A has F as 
a configuration denoted F G A, if there is a submatrix of A which is a row and column permutation 
of F. Theorem [2] gives upper bound on m for an m x n simple (fc)-matrix that does not have the 
k s x s configuration of all distinct rows of size s. 

Definition ([9| of forb(n, F) can be applied to (A;) -matrices, as well. However, if polynomial 
upper bounds are desired, then more than one configurations must be forbidden simultaneously. Let 
F = {Fx, i*2, . . . , F t } be a collection of (not necessarily simple) (k) -matrices. Let 

forb(n, k, F) = max{m: A is m x n simple (fe)-matrix and has no configuration F G F}. 
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In [7] it was proved that 



forb(n,k,T) = 9 (n(a) ) for J" 



a b 
b a 



0<a<b<k-l 



Theorem [6] can also be reformulated in this language. Let F be a (0,l)-matrix, then F(i,j) denotes 
the (i, j ) -matrix obtained from F replacing O's by i's and l's by j's. Let K s denote the 2 s x s 
(0,l)-matrix of all distinct rows of size s. Theorem [6] gives bounds for forb(n, fc, J 7 ) where T = 
{K Sij (i,j) : 0<i<j<k — lj. Here we prove a lower bound. 

Proposition 7 Let F % ' 3 : 0<i<j<k — lbe simple (0, l)-matrices such that none of them contains 
a constant column. Then 



forb(n,k,{F^(i,j):0<i<j<k-l})> ]J forbf^-,^) 



(10) 



Proof: We apply the product construction introduced in [3]. Let A 1 * 3 be a simple (0, l)-matrix with 



columns and forb ( ■^y,i ?JJ ) rows without configuration Let 



A = A ' 1 x A ' 2 x ... x A*- 2 *- 1 



be the matrix with n columns and I^L ' 1 ) • \A 0,2 \ ■ . . . ■ l^- 2 .^ 1 ] rows obtained by choosing one row 
from each of the matrices and putting them side by side in every possible way. We claim that this 
product matrix A avoids all configurations F" 1 ' 3 : 0<i<j<k — 1. Indeed, since each column of F 1 ' 3 
contains both symbols i and j, columns of a configuration F 1 ' 3 should come from columns of A 1 ' 3 in 
the product. Suppose F tJ has p columns. Since F 1 ' 3 is simple and A 1 ' 3 does not have configuration 
F 1 ' 3 , for each p-tuple of columns of A 1 ' 3 there must be a row of F 1 ' 3 that is missing on those columns. 
This will be missing in the product matrix, as well. □ 

Lower bound ^ follows by taking F 1 ' 3 = K Si j :0<i<j<A; — 1 and applying Theorem [lj 



8 



5 Open problems 

There are more questions than answers known in connection with (i, ^-shattering. The principal 
problem is that Theorem [6] does not give sharp bounds, in contrast with Theorem [T] and Theorem [2j 
We can give an exact bound only if most of the Sjj's are ones. 

Proposition 8 Assume that Sij = 1 if i < j < k — 1. Then 

k-2 / , \ / \ / 

Tlj \ f Tlj \ I Tli 





Proof: Suppose that A is a (Zc)-matrix without configurations K Sij . Sij = 1 means that symbols 
i and j cannot occur in the same column of A. Thus columns of A can be partitioned into k — 1 
parts, part Cj containing only symbols i and k — 1 for < i < k — 1. The number of different 
projections onto column set Ci is ( s . J^-i) + ( s . ^-2) + • • • + Co) f° r n « = |Ci| by TheoremjTJ Thus 
the maximum number of different rows of A is at most nto ( G n *— 1) + (s "1-2) • • • Co 



On the other hand the product construction ( 10 ) provides a matching lower bound. □ 



It would be interesting to find exact bounds for other special cases, as well. 

Another question whether containing no constant column or simplicity of the forbidden configura- 
tions is necessary condition in Proposition [7} Also, Proposition [7] and Theorem [6] give asymptotically 
tight bounds if forb(n, F 1 ' 3 ) = 0(n Si -'~ 1 ) where Sjj is the number of columns of F 1 ' 3 . The question 
is that does Proposition [7] give the correct order of magnitude of forb(n, k, J 7 ) for other lists T of 
forbidden configurations? 

Since VC-dimension has many of applications in statistics, computer science and combinatorics, 
it seems likely that bi-dimension can be applied there, too. 
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